Spatial Feature Clustering — DLPFC-151673

Sample Tissue — DLPFC-151673

H&E stained tissue section from the Visium slide. This image identifies the analyzed sample and provides anatomical context for interpreting the spatial expression patterns discovered by the clustering pipeline.
H&E tissue image
Note — Gene-Level Clustering
This pipeline clusters genes into co-expression modules based on their spatial expression patterns across all Visium spots. This is fundamentally different from 10x Genomics Space Ranger, which clusters spots (tissue regions / cell types). The gene modules identified here represent groups of genes with similar spatial expression structure, not tissue domains.

Executive Summary

This report summarizes the results of a full spatial feature clustering pipeline run on 10x Genomics Visium transcriptomics data. The pipeline clusters genes (not spots/cells) into co-expression modules by selecting spatially variable genes, computing multiple similarity representations (expression, spatial, MoG, weighted), clustering genes under each representation, and evaluating cluster quality using both internal metrics and spatial coherence.

Pipeline execution completed: 6/6 notebooks succeeded. ALL PASSED

Optimal Parameters (Sensitivity Analysis)

α (Expression)
0.00
β (Spatial)
0.40
γ (MoG)
0.60
Silhouette
0.125
Resolution
0.80

Configuration

dataset_path: data/DLPFC-151673 clustering: method: louvain resolution: 0.8 random_state: 0 optimize_resolution: False resolution_range: [0.3, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.2, 1.5, 2.0] similarity: weights: alpha: 0.0 beta: 0.4 gamma: 0.6 preprocessing: n_top_genes: 300 min_gene_expression: 300 n_top_genes_hvg: 3000 pca_components: 50 evaluation: n_neighbors: 6 use_pca_for_coherence: True _updated: session: run_2026-02-10_04-27-21 date: 2026:02:10 04:27:21 note: Auto-updated from sensitivity analysis (notebook 06)

Execution Status

NotebookStatus
01_explore_dataset.ipynbSUCCESS
02_baseline.ipynbSUCCESS
03_spatial_weighted_similarity.ipynbSUCCESS
04_multiview_clustering.ipynbSUCCESS
05_final_plots.ipynbSUCCESS
06_sensitivity_analysis.ipynbSUCCESS

Baseline Metrics

Baseline clustering uses expression-only similarity matrices (Pearson, Spearman, Cosine) without any spatial filtering. These metrics serve as the reference point against which spatially informed clusterings are compared.
What is Silhouette?

The Silhouette Score measures how similar each gene is to its own cluster compared to other clusters. Values range from −1 (wrong cluster) to +1 (well-matched). A score near 0 indicates overlapping clusters.

What is Calinski Harabasz?

The Calinski-Harabasz Index (Variance Ratio Criterion) is the ratio of between-cluster dispersion to within-cluster dispersion. Higher values indicate denser, better-separated clusters.

What is Davies Bouldin?

The Davies-Bouldin Index measures the average similarity between each cluster and the one most similar to it. Lower values indicate better separation.

silhouettecalinski_harabaszdavies_bouldin
pearson0.110217.23632.3873
spearman-0.25248.35273.2220
cosine0.118317.87072.3529

Multi-View Comparison

The multi-view analysis clusters genes under four different similarity representations and then measures pairwise agreement. High ARI/NMI between views confirms that the core gene modules are robust; divergences highlight genes whose grouping is sensitive to spatial context.

Adjusted Rand Index (ARI) Matrix

The Adjusted Rand Index (ARI) quantifies the agreement between two clusterings, adjusted for chance. Values range from −1 (worse than random) through 0 (random) to +1 (perfect agreement). An ARI ≥ 0.8 is generally considered strong agreement.
expressionspatialmogweighted
expression1.0000.8710.9340.934
spatial0.8711.0000.9090.934
mog0.9340.9091.0000.973
weighted0.9340.9340.9731.000

Normalized Mutual Information (NMI) Matrix

The Normalized Mutual Information (NMI) measures the mutual dependence between two clusterings, normalized to [0, 1]. A value of 1 means the clusterings are identical; 0 means they share no information.
expressionspatialmogweighted
expression1.0000.8190.8920.892
spatial0.8191.0000.8480.893
mog0.8920.8481.0000.941
weighted0.8920.8930.9411.000

Cluster Summary

Summary of the final weighted clustering. Size is the number of genes assigned to each cluster. Spatial Coherence (Moran's I) quantifies how spatially structured the average expression profile of each cluster is on the Visium tissue.
What is Moran's I?

The Moran's I statistic measures spatial autocorrelation — the degree to which nearby spots on the Visium slide share similar gene-expression patterns. Values near +1 indicate strong spatial clustering; near 0 indicates randomness; near −1 indicates dispersion.

Size (# genes)Spatial Coherence (Moran's I)
0126.00000.9234
1174.00000.6728

Sensitivity Analysis

Sensitivity analysis verifies that the chosen parameters are robust. If small changes to weights or resolution cause large shifts in cluster assignments (low ARI), the result is fragile. Stable, high ARI across a wide parameter range indicates a trustworthy clustering.
Combinations tested
24
Robust (ARI ≥ 0.8)
24/24
Mean ARI
0.972
Best Silhouette
0.125
Cluster counts seen
2

Top 5 Weight Combinations (by Silhouette)

αβγSilhouetteARI vs. baseline# Clusters
0.00.40.60.1250.9872
0.60.20.20.1250.9872
0.20.20.60.1250.9872
0.40.20.40.1250.9872
0.00.50.50.1241.0002

Saved Data Arrays

Summary statistics for all numpy arrays stored during the pipeline run. These files contain raw similarity matrices, cluster label vectors, selected gene indices, and intermediate results.
NameShapeDtypeMinMaxMean
baseline_labels_cosine(300,)int320.00001.00000.5967
baseline_labels_pearson(300,)int320.00001.00000.5800
baseline_labels_spearman(300,)int320.00002.00001.1267
baseline_similarity_cosine300 x 300float320.00000.92920.2287
baseline_similarity_pearson300 x 300float64-0.23500.90170.0773
baseline_similarity_spearman300 x 300float64-0.26730.72460.0630
baseline_top_genes(300,)int6427.000033494.000016869.6200
cluster_labels(300,)int320.00001.00000.5800
cluster_labels_expression(300,)int320.00001.00000.5967
cluster_labels_mog(300,)int320.00001.00000.5800
cluster_labels_spatial(300,)int320.00001.00000.5633
cluster_labels_weighted(300,)int320.00001.00000.5800
similarity_expression300 x 300float320.00000.92920.2287
similarity_matrix300 x 300float640.00000.95130.3855
similarity_mog300 x 300float640.00000.97470.3125
similarity_spatial300 x 300float640.00000.99500.6956
similarity_weighted300 x 300float640.00000.95130.3855
top_genes(300,)int6427.000033494.000016869.6200
top_genes_multiview(300,)int6427.000033494.000016869.6200

Visualizations

All visualizations generated during the pipeline run are collected below. Each figure is accompanied by an explanation of what it shows and how to interpret it.

Notebook 01 — Data Exploration

Distribution of Total Counts per Gene

Histogram of the total counts aggregated per gene across all spots. Most genes have very low total counts; a small number of highly expressed genes dominate. This motivates filtering to the top spatially variable genes.
nb01_dist_counts_per_gene

Distribution of Total Counts per Spot

Histogram of the total UMI (unique molecular identifier) counts per spot. Spots with very low counts may indicate empty or low-quality capture areas. A unimodal distribution with a long right tail is typical for Visium data.
nb01_dist_counts_per_spot

Distribution of Detected Genes per Spot

Histogram of the number of distinct genes detected (count > 0) per spot. This is a key QC metric — spots with unusually few detected genes may lie outside the tissue or suffer from capture failure.
nb01_dist_genes_per_spot

Gene Diagnostic (Exploration) — 17754

Full diagnostic plot for a randomly sampled gene from notebook 01. Shows: raw spatial expression, mean-filtered spatial map, non-zero value histogram, ordered CDF, GMM fit on filtered data, and MoG-binarized spatial map. This panel reveals whether the gene has a clear spatial domain (visible in the MoG panel) or is diffusely expressed.
nb01_gene_diag_17754

Gene Diagnostic (Exploration) — 22295

Full diagnostic plot for a randomly sampled gene from notebook 01. Shows: raw spatial expression, mean-filtered spatial map, non-zero value histogram, ordered CDF, GMM fit on filtered data, and MoG-binarized spatial map. This panel reveals whether the gene has a clear spatial domain (visible in the MoG panel) or is diffusely expressed.
nb01_gene_diag_22295

Gene Diagnostic (Exploration) — 22690

Full diagnostic plot for a randomly sampled gene from notebook 01. Shows: raw spatial expression, mean-filtered spatial map, non-zero value histogram, ordered CDF, GMM fit on filtered data, and MoG-binarized spatial map. This panel reveals whether the gene has a clear spatial domain (visible in the MoG panel) or is diffusely expressed.
nb01_gene_diag_22690

Gene Diagnostic (Exploration) — 27957

Full diagnostic plot for a randomly sampled gene from notebook 01. Shows: raw spatial expression, mean-filtered spatial map, non-zero value histogram, ordered CDF, GMM fit on filtered data, and MoG-binarized spatial map. This panel reveals whether the gene has a clear spatial domain (visible in the MoG panel) or is diffusely expressed.
nb01_gene_diag_27957

Gene Diagnostic (Exploration) — 6611

Full diagnostic plot for a randomly sampled gene from notebook 01. Shows: raw spatial expression, mean-filtered spatial map, non-zero value histogram, ordered CDF, GMM fit on filtered data, and MoG-binarized spatial map. This panel reveals whether the gene has a clear spatial domain (visible in the MoG panel) or is diffusely expressed.
nb01_gene_diag_6611

Known Marker Gene — MOBP

Diagnostic plot for a known marker gene (e.g., MOBP for oligodendrocytes in DLPFC). The spatial pattern should match known tissue layer anatomy, serving as a sanity check for data quality and spatial alignment.
nb01_marker_MOBP

Tissue & Spot Layout

Visium spots overlaid on the H&E tissue image, colored by total UMI counts per spot. Brighter spots indicate higher sequencing depth. This overview confirms spot coordinates and reveals the overall tissue morphology alongside the expression intensity landscape.
nb01_tissue_spots

Notebook 02 — Baseline Clustering

Baseline Cluster Representative — 0 gene 6546

Diagnostic plot for a representative gene from a baseline (expression-only Pearson) cluster. The spatial pattern illustrates the type of expression structure captured without any spatial filtering.
nb02_baseline_cluster_0_gene_6546

Baseline Cluster Representative — 1 gene 12035

Diagnostic plot for a representative gene from a baseline (expression-only Pearson) cluster. The spatial pattern illustrates the type of expression structure captured without any spatial filtering.
nb02_baseline_cluster_1_gene_12035

Baseline Similarity Matrices (Pearson / Spearman / Cosine)

Heatmaps of the three expression-only gene-gene similarity matrices. Pearson captures linear correlation, Spearman captures rank-order correlation, and Cosine measures directional similarity. Block structure indicates gene modules detectable from expression alone.
nb02_baseline_similarity_matrices

Notebook 03 — Spatial Weighted Similarity

Weighted Cluster Representative — 0 gene 12035

Diagnostic plot for a representative gene from the spatially weighted clustering (NB03). Compared to baseline, the MoG panel should show cleaner spatial domains thanks to the inclusion of spatial and MoG similarity components.
nb03_weighted_cluster_0_gene_12035

Weighted Cluster Representative — 1 gene 23765

Diagnostic plot for a representative gene from the spatially weighted clustering (NB03). Compared to baseline, the MoG panel should show cleaner spatial domains thanks to the inclusion of spatial and MoG similarity components.
nb03_weighted_cluster_1_gene_23765

Spatial Weighted Similarity Matrix (NB03)

Heatmap of the combined similarity matrix computed with weights α·Expr + β·Spatial + γ·MoG in notebook 03. This is the primary similarity used for the initial weighted clustering.
nb03_weighted_similarity_heatmap

Notebook 04 — Multi-View Clustering

ARI / NMI Inter-View Comparison (NB04)

Pairwise comparison of the four clustering views (expression, spatial, MoG, weighted) using ARI (left) and NMI (right). High off-diagonal values mean the two views largely agree on gene groupings; lower values reveal genes reclassified when spatial or MoG information is introduced.
nb04_ari_nmi_heatmaps

View-Switching Gene (NB04) — 12035

Diagnostic plot for a gene that changed cluster between the two most different views. These genes sit at the boundary between modules and are biologically interesting — their grouping depends on whether spatial context is considered.
nb04_changed_gene_12035

View-Switching Gene (NB04) — 23259

Diagnostic plot for a gene that changed cluster between the two most different views. These genes sit at the boundary between modules and are biologically interesting — their grouping depends on whether spatial context is considered.
nb04_changed_gene_23259

View-Switching Gene (NB04) — 28749

Diagnostic plot for a gene that changed cluster between the two most different views. These genes sit at the boundary between modules and are biologically interesting — their grouping depends on whether spatial context is considered.
nb04_changed_gene_28749

View-Switching Gene (NB04) — 9104

Diagnostic plot for a gene that changed cluster between the two most different views. These genes sit at the boundary between modules and are biologically interesting — their grouping depends on whether spatial context is considered.
nb04_changed_gene_9104

View-Switching Gene (NB04) — 9816

Diagnostic plot for a gene that changed cluster between the two most different views. These genes sit at the boundary between modules and are biologically interesting — their grouping depends on whether spatial context is considered.
nb04_changed_gene_9816

Multi-View Representative — expression gene 6546

Diagnostic plot for a representative gene from one of the four multi-view clusterings (expression, spatial, MoG, or weighted). Comparing across views reveals how different similarity representations emphasize different spatial patterns.
nb04_view_expression_gene_6546

Multi-View Representative — mog gene 12035

Diagnostic plot for a representative gene from one of the four multi-view clusterings (expression, spatial, MoG, or weighted). Comparing across views reveals how different similarity representations emphasize different spatial patterns.
nb04_view_mog_gene_12035

Multi-View Representative — spatial gene 12035

Diagnostic plot for a representative gene from one of the four multi-view clusterings (expression, spatial, MoG, or weighted). Comparing across views reveals how different similarity representations emphasize different spatial patterns.
nb04_view_spatial_gene_12035

Multi-View Representative — weighted gene 12035

Diagnostic plot for a representative gene from one of the four multi-view clusterings (expression, spatial, MoG, or weighted). Comparing across views reveals how different similarity representations emphasize different spatial patterns.
nb04_view_weighted_gene_12035

Notebook 05 — Final Plots

Similarity Matrices Overview (Publication)

Side-by-side heatmaps of the four gene-gene similarity matrices used in multi-view clustering: Expression (Pearson on raw counts), Spatial (after mean filter), MoG (binarized), and Weighted (α·Expr + β·Spatial + γ·MoG). Block-diagonal structure indicates clear gene modules.
all_similarity_matrices

ARI / NMI Matrices (Publication)

Publication-quality pairwise comparison matrices. ARI (left) and NMI (right) between all four clustering views. Values close to 1.0 confirm that the gene modules are robust across representations.
ari_nmi_matrices

Spatial Coherence per Cluster (Moran's I)

Bar chart of the average Moran's I for each gene cluster. A tall bar means the genes in that cluster have spatially coherent expression profiles across the Visium tissue, validating that the cluster captures a biologically meaningful spatial pattern.
spatial_coherence_bar

Weighted Similarity Matrix (Publication)

Detailed view of the final weighted similarity matrix that combines expression, spatial, and MoG components with weights α, β, γ. Genes are ordered by cluster assignment; the visible block structure confirms well-defined gene modules.
weighted_similarity_matrix

View-Switching Gene (Final) — 11009

Diagnostic plot for a gene that changed cluster between the two most different views, generated during the final analysis. These genes are candidates for further biological investigation.
nb05_changed_gene_11009

View-Switching Gene (Final) — 12035

Diagnostic plot for a gene that changed cluster between the two most different views, generated during the final analysis. These genes are candidates for further biological investigation.
nb05_changed_gene_12035

View-Switching Gene (Final) — 18155

Diagnostic plot for a gene that changed cluster between the two most different views, generated during the final analysis. These genes are candidates for further biological investigation.
nb05_changed_gene_18155

View-Switching Gene (Final) — 20841

Diagnostic plot for a gene that changed cluster between the two most different views, generated during the final analysis. These genes are candidates for further biological investigation.
nb05_changed_gene_20841

View-Switching Gene (Final) — 23259

Diagnostic plot for a gene that changed cluster between the two most different views, generated during the final analysis. These genes are candidates for further biological investigation.
nb05_changed_gene_23259

Cluster Spatial Profiles

Spatial distribution of gene clusters on the tissue. Each panel shows one cluster's average expression across all Visium spots, overlaid on the H&E tissue image. Moran's I quantifies spatial coherence — higher values indicate the gene module is spatially structured rather than randomly distributed.
nb05_cluster_spatial_profiles

Final Cluster Representative — 0 gene 12035

Publication-quality diagnostic plot for a representative gene from the final weighted clustering. The six panels provide a complete picture of the gene's expression distribution and spatial structure.
nb05_final_cluster_0_gene_12035

Final Cluster Representative — 1 gene 23765

Publication-quality diagnostic plot for a representative gene from the final weighted clustering. The six panels provide a complete picture of the gene's expression distribution and spatial structure.
nb05_final_cluster_1_gene_23765

Notebook 06 — Sensitivity Analysis

Optimized vs. Fixed Resolution

Comparison of clustering outcomes when using the automatically optimized resolution versus the default fixed value (1.0). If both produce similar ARI / cluster counts, the default resolution is already near-optimal for this dataset.
resolution_comparison

Resolution Optimization

Grid search over the Louvain/Leiden resolution parameter. The left panel shows Silhouette score vs. resolution — the peak identifies the resolution that produces the most internally cohesive and well-separated clusters. The right panel shows how the number of clusters grows with resolution.
resolution_optimization

Weight Sensitivity Analysis

Heatmaps showing how cluster quality varies across different combinations of the three similarity weights: α (expression), β (spatial), and γ (MoG). The left heatmap displays ARI relative to the baseline, while the right shows Silhouette score. Regions of uniformly high ARI indicate robust weight ranges where the clustering is stable.
weight_sensitivity_heatmaps

PCA Variance Explained

Left: scree plot showing individual variance explained by each principal component — the rapid drop-off identifies the intrinsic dimensionality of the data. Right: cumulative variance with the 95%% threshold (red) and the elbow point (orange). Retaining ≥ 95%% of variance ensures downstream neighbor computation operates on a faithful low-dimensional representation.
pca_variance_explained

Session Artifacts

All raw data files (numpy arrays, CSV tables, log files) are stored in the session directory: run_2026-02-10_05-29-59